Fast Learning VIEWNET Architectures for Recognizing 3-D Objects from Multiple 2-D Views
نویسندگان
چکیده
The recognition of 3-D objects from sequences of their 2-D views is modeled by a family of self-organizing neural architectures, called VIEWNET, that use View Information Encoded With NETworks. VIEWNET incorporates a preprocessor that generates a compressed but 2-D invariant representation of an image, a supervised incremental learning system that classifies the preprocessed representations into 2-D view categories whose outputs arc combined into 3-D invariant object categories, and a working memory that makes a 3-D object prediction by a.ccurnulating evidence from 3-D object category nodes as multiple 2-D views are experienced. 'I'hc simplest VIEWNE'I' achieves high recognition scores without the need to explicitly code the tcmpora.l order of 2-D views in working memory. Working memories are also discussed that save memory resources by implicitly coding temporal order in terms of the relative activity of 2-D view category nodes, rather than as explicit 2-D view transitions. Varia.nts of the VIEWNE'I' architecture may also be used for scene understanding by using a preprocessor and classifier that can determine both What objects arc in a scene and Where they arc located. 'I'hc present VIEWNE'f preprocessor includes the COHT-X 2 filter, which discounts the illurnina.nt, regularizes and completes figural boundaries, and suppresseo image noise. 'I'his boundary segmentation is rendered invariant. under 2-D t.ra.nsla.t.ion, rotation, and dilation by use of a log-polar transform. The invariant spectra undergo Gaussian coarse coding to further reduce noise and 3-D foreshortening effects, and to increase generalization. 'I'hcsc compressed cocleo are input into the classifier, a supervised learning system baocd on the fuzzy AJ(I'iVIAP algorithm. Fuzzy AltfMAP learns 2-D view categories that are invariant under 2-D image translation, rotation, and dilation a.s well as 3-D image tra.noformations that do not cause a. predictive error. Evidence from scquenceo of 2-D view categories converges at :J-D object nodes that generate a responoc invariant under changes of 2-D view. 'I'hese 3-D object nodes input to a working mernory that accumulateo evidence over time to irnprove object recognition. ln the oimplest working rnernory, each occurrence (nonoccurrence) of a. 2-D view category increa.oeo (decreases) the correoponding nocle'0 activity in working rncrnory, 'I'he rna.xirnally active node is used to predict the il-D object. Recognition is studied with noisy and clean ima.geo using slow ancl fast learning. Slow learning at the fuzzy AH'I'MAP rnap field is adapted to learn the conditional probability of the ~l-D object given the selected 2D view category. VIEWNET is clernonstrated on an MJ'I' Lincoln Laboratory database of l28x128 2-D views of aircraft with and without additive noise. i\ recognition rate of up to 90% is achieved with one 2-D view and of up to 98.5% correct with three 2-D views. 'I'hc properties of 2-D view and 3-D object category nodes are compared with those of cello in monkey inferotcmporal cortex.
منابع مشابه
Viewnet Architectures for Invariant 3-d Object Learning and Recognition from Multiple 2-d Views
3 ABSTRACT 3 The recognition of 3-D objects from sequences of their 2-D views is modeled by a family of self-organizing neural architectures, called VIEWNET, that use View Information Encoded With NETworks. VIEWNET incorporates a preprocessor that generates a compressed but 2-D invariant representation of an image, a supervised incremental learning system (Fuzzy ARTMAP) that classiies the prepr...
متن کاملFast-learning VIEWNET architectures for recognizing three-dimensional objects from multiple two-dimensional views
-The recognition o f three-dimensional ( 3-D ) objects from sequences o f their two-dimensional ( 2-D ) views is modeled by a family o f self-organizing neural architectures, called VIEWNET, that use View Information Encoded With NETworks. V IEWNET incorporates a preprocessor that generates a compressed but 2-D invariant representation o f an image, a supervised incremental learning system that...
متن کاملSpatiotemporal information during unsupervised learning enhances viewpoint invariant object recognition.
Recognizing objects is difficult because it requires both linking views of an object that can be different and distinguishing objects with similar appearance. Interestingly, people can learn to recognize objects across views in an unsupervised way, without feedback, just from the natural viewing statistics. However, there is intense debate regarding what information during unsupervised learning...
متن کاملUtilizing Temporal Associations for View-based 3-d Object Recognition I. Utilizing Temporal Associations for Viewer Centered Representations Ii. Overview of the Recognition System
We propose an architecture for the recognition of three-dimensional objects on the basis of viewer centered representations and temporal associations. Motivated by biological ndings and by successful computational implementations we have chosen a viewer centered representation scheme. In contrast to other implementations , special attention is paid to the temporal order of the views, which prov...
متن کاملFast 3-D Object Recognition using Feature Based Aspect-Trees
Olaf Munkelt, Christoph Zierl Technische Universit at M unchen Institut f ur Informatik { Prof. Dr. B. Radig, 80290 M unchen, Germany fmunkelt,[email protected] Abstract This contribution focuses on the recognition of a priori known 3-D objects in single 2-D images. The underlying model is embedded in the domain of CADbased vision using a viewer-centered approach to generate ...
متن کامل